By: Meghan O'Malley
#https://www.cdc.gov/
#The libraries I'll be using
import pandas as pd
import numpy as np
import altair as alt
pd.options.display.max_columns = 100
%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import ticker
import seaborn
We are using pandas to read a csv file from the CDC.
df = pd.read_csv('https://data.cdc.gov/api/views/muzy-jte6/rows.csv?accessType=DOWNLOAD',
parse_dates=['Week Ending Date'])
df.head()
| Jurisdiction of Occurrence | MMWR Year | MMWR Week | Week Ending Date | All Cause | Natural Cause | Septicemia (A40-A41) | Malignant neoplasms (C00-C97) | Diabetes mellitus (E10-E14) | Alzheimer disease (G30) | Influenza and pneumonia (J09-J18) | Chronic lower respiratory diseases (J40-J47) | Other diseases of respiratory system (J00-J06,J30-J39,J67,J70-J98) | Nephritis, nephrotic syndrome and nephrosis (N00-N07,N17-N19,N25-N27) | Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99) | Diseases of heart (I00-I09,I11,I13,I20-I51) | Cerebrovascular diseases (I60-I69) | COVID-19 (U071, Multiple Cause of Death) | COVID-19 (U071, Underlying Cause of Death) | flag_allcause | flag_natcause | flag_sept | flag_neopl | flag_diab | flag_alz | flag_inflpn | flag_clrd | flag_otherresp | flag_nephr | flag_otherunk | flag_hd | flag_stroke | flag_cov19mcod | flag_cov19ucod | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | United States | 2020 | 1 | 2020-01-04 | 60167.0 | 55016.0 | 846.0 | 11567.0 | 1827.0 | 2532.0 | 1560.0 | 3501.0 | 1066.0 | 1095.0 | 667.0 | 14195.0 | 3108.0 | 3.0 | 3.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | United States | 2020 | 2 | 2020-01-11 | 60719.0 | 55732.0 | 867.0 | 11962.0 | 1941.0 | 2564.0 | 1530.0 | 3708.0 | 1035.0 | 1094.0 | 675.0 | 13903.0 | 3183.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | United States | 2020 | 3 | 2020-01-18 | 59347.0 | 54511.0 | 831.0 | 11706.0 | 1819.0 | 2490.0 | 1486.0 | 3524.0 | 991.0 | 1123.0 | 649.0 | 13586.0 | 3255.0 | 3.0 | 2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | United States | 2020 | 4 | 2020-01-25 | 59147.0 | 54399.0 | 830.0 | 11882.0 | 1861.0 | 2516.0 | 1487.0 | 3397.0 | 978.0 | 1109.0 | 688.0 | 13602.0 | 3179.0 | 2.0 | 2.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | United States | 2020 | 5 | 2020-02-01 | 58820.0 | 54009.0 | 811.0 | 11959.0 | 1827.0 | 2477.0 | 1419.0 | 3311.0 | 980.0 | 1072.0 | 661.0 | 13456.0 | 3080.0 | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
We want to be looking at the states. By doing this we are considering the unique values.
df['Jurisdiction of Occurrence'].value_counts().sort_index()
Alabama 60 Alaska 60 Arizona 60 Arkansas 60 California 60 Colorado 60 Connecticut 60 Delaware 60 District of Columbia 60 Florida 60 Georgia 60 Hawaii 60 Idaho 60 Illinois 60 Indiana 60 Iowa 60 Kansas 60 Kentucky 60 Louisiana 60 Maine 60 Maryland 60 Massachusetts 60 Michigan 60 Minnesota 60 Mississippi 60 Missouri 60 Montana 60 Nebraska 60 Nevada 60 New Hampshire 60 New Jersey 60 New Mexico 60 New York 60 New York City 60 North Carolina 60 North Dakota 60 Ohio 60 Oklahoma 60 Oregon 60 Pennsylvania 60 Puerto Rico 60 Rhode Island 60 South Carolina 60 South Dakota 60 Tennessee 60 Texas 60 United States 60 Utah 60 Vermont 60 Virginia 60 Washington 60 West Virginia 60 Wisconsin 60 Wyoming 60 Name: Jurisdiction of Occurrence, dtype: int64
As shown above, New York City was in the list and so was the United States. Because these are not states we will take them out.
unitedStatesRows = df['Jurisdiction of Occurrence'] == 'United States'
nycRows = df['Jurisdiction of Occurrence'] == 'New York City'
goodRows = df[~unitedStatesRows & ~nycRows]
goodRows['Jurisdiction of Occurrence'].value_counts().sort_index()
Alabama 60 Alaska 60 Arizona 60 Arkansas 60 California 60 Colorado 60 Connecticut 60 Delaware 60 District of Columbia 60 Florida 60 Georgia 60 Hawaii 60 Idaho 60 Illinois 60 Indiana 60 Iowa 60 Kansas 60 Kentucky 60 Louisiana 60 Maine 60 Maryland 60 Massachusetts 60 Michigan 60 Minnesota 60 Mississippi 60 Missouri 60 Montana 60 Nebraska 60 Nevada 60 New Hampshire 60 New Jersey 60 New Mexico 60 New York 60 North Carolina 60 North Dakota 60 Ohio 60 Oklahoma 60 Oregon 60 Pennsylvania 60 Puerto Rico 60 Rhode Island 60 South Carolina 60 South Dakota 60 Tennessee 60 Texas 60 Utah 60 Vermont 60 Virginia 60 Washington 60 West Virginia 60 Wisconsin 60 Wyoming 60 Name: Jurisdiction of Occurrence, dtype: int64
allDeathsPerState_df = pd.pivot_table(data = goodRows,
index = 'Week Ending Date',
columns = 'Jurisdiction of Occurrence',
values = 'All Cause'
)
allDeathsPerState_df.tail()
| Jurisdiction of Occurrence | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | District of Columbia | Florida | Georgia | Hawaii | Idaho | Illinois | Indiana | Iowa | Kansas | Kentucky | Louisiana | Maine | Maryland | Massachusetts | Michigan | Minnesota | Mississippi | Missouri | Montana | Nebraska | Nevada | New Hampshire | New Jersey | New Mexico | New York | North Carolina | North Dakota | Ohio | Oklahoma | Oregon | Pennsylvania | Puerto Rico | Rhode Island | South Carolina | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Week Ending Date | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2021-01-23 | 1591.0 | 73.0 | 2310.0 | 934.0 | 8724.0 | 948.0 | 588.0 | 202.0 | 139.0 | 5530.0 | 2498.0 | 197.0 | 326.0 | 2545.0 | 1383.0 | 654.0 | 654.0 | 1119.0 | 1056.0 | 335.0 | 1330.0 | 1476.0 | 2172.0 | 885.0 | 942.0 | 1569.0 | 229.0 | 379.0 | 769.0 | 292.0 | 1948.0 | 444.0 | 2934.0 | 133.0 | 121.0 | 2702.0 | 1104.0 | 739.0 | 3692.0 | 195.0 | 266.0 | 556.0 | 170.0 | 2149.0 | 5997.0 | 487.0 | 121.0 | 1994.0 | 1256.0 | 120.0 | 1247.0 | 105.0 |
| 2021-01-30 | 1365.0 | 59.0 | 1867.0 | 798.0 | 7288.0 | 846.0 | 464.0 | 183.0 | 131.0 | 5018.0 | 2070.0 | 207.0 | 233.0 | 2405.0 | 1189.0 | 518.0 | 635.0 | 904.0 | 868.0 | 350.0 | 1219.0 | 1259.0 | 1644.0 | 814.0 | 779.0 | 1369.0 | 180.0 | 320.0 | 668.0 | 260.0 | 1895.0 | 365.0 | 2662.0 | 109.0 | 124.0 | 2239.0 | 878.0 | 664.0 | 3311.0 | 135.0 | 192.0 | 1088.0 | 151.0 | 1872.0 | 4909.0 | 460.0 | 94.0 | 1507.0 | 1179.0 | 77.0 | 1095.0 | 95.0 |
| 2021-02-06 | 1137.0 | 47.0 | 1450.0 | 645.0 | 5591.0 | 662.0 | 258.0 | 152.0 | 104.0 | 4429.0 | 1742.0 | 193.0 | 220.0 | 2103.0 | 999.0 | 468.0 | 544.0 | 708.0 | 707.0 | 262.0 | 1038.0 | 1172.0 | 1537.0 | 669.0 | 655.0 | 1078.0 | 155.0 | 256.0 | 569.0 | 249.0 | 1679.0 | 299.0 | 2215.0 | 89.0 | 103.0 | 1924.0 | 603.0 | 500.0 | 2663.0 | 79.0 | 195.0 | 1064.0 | 118.0 | 1414.0 | 4000.0 | 349.0 | 89.0 | 1383.0 | 927.0 | 59.0 | 827.0 | 77.0 |
| 2021-02-13 | 957.0 | 37.0 | 1187.0 | 522.0 | 4423.0 | 521.0 | 82.0 | 126.0 | 73.0 | 4019.0 | 1265.0 | 207.0 | 208.0 | 1938.0 | 817.0 | 399.0 | 420.0 | 572.0 | 535.0 | 269.0 | 947.0 | 1056.0 | 1385.0 | 542.0 | 523.0 | 902.0 | 134.0 | 232.0 | 514.0 | 204.0 | 1417.0 | 253.0 | 1945.0 | 85.0 | 34.0 | 1492.0 | 496.0 | 386.0 | 2388.0 | 44.0 | 121.0 | 915.0 | 105.0 | 1086.0 | 3242.0 | 324.0 | 78.0 | 1161.0 | 807.0 | 35.0 | 797.0 | 76.0 |
| 2021-02-20 | 708.0 | 30.0 | 1000.0 | 453.0 | 3057.0 | 505.0 | 17.0 | 85.0 | 64.0 | 3384.0 | 850.0 | 184.0 | 193.0 | 1718.0 | 641.0 | 333.0 | 409.0 | 382.0 | 336.0 | 216.0 | 769.0 | 992.0 | 1290.0 | 522.0 | 420.0 | 769.0 | 97.0 | 198.0 | 373.0 | 203.0 | 1322.0 | 182.0 | 1840.0 | 69.0 | 49.0 | 1005.0 | 318.0 | 294.0 | 2097.0 | 16.0 | 27.0 | 754.0 | 78.0 | 902.0 | 2393.0 | 117.0 | 98.0 | 1056.0 | 746.0 | NaN | 724.0 | 63.0 |
Death count by state starting in 2020. Although there are many lines we can still notice patterns in the trend lines. This is important because this graph can be compared to other years and we can see how the trends compare.
allStates_df = allDeathsPerState_df.loc['2020-01-01':,['Alabama','Alaska','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','District of Columbia','Florida','Georgia','Hawaii','Idaho','Illinois','Indiana','Iowa','Kansas','Kentucky','Louisiana','Maine','Maryland','Massachusetts','Michigan','Minnesota','Mississippi','Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico','New York','North Carolina','North Dakota','Ohio','Oklahoma','Oregon','Pennsylvania','Puerto Rico','Rhode Island','South Carolina','South Dakota','Tennessee','Texas','Utah','Vermont','Virginia','Washington','West Virginia','Wisconsin','Wyoming']]
f,ax = plt.subplots(figsize=(24,16))
allStates_df.plot(ax=ax,lw=3)
ax.set_ylim((1,11500))
ax.set_ylabel('COVID-19 Deaths')
ax.legend(title='State',loc='upper left')
ax.axhline(10,c='k',ls='--',lw=1) # Suppressed counts
<matplotlib.lines.Line2D at 0x1d19d977670>
Because 2021 has just started there is data here. But I would just like to look at 2020...
allDeathsPerStateByYear_df = pd.pivot_table(data = goodRows,
index = 'MMWR Year',
columns = 'Jurisdiction of Occurrence',
values = 'All Cause'
)
allDeathsPerStateByYear_df.tail()
| Jurisdiction of Occurrence | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | District of Columbia | Florida | Georgia | Hawaii | Idaho | Illinois | Indiana | Iowa | Kansas | Kentucky | Louisiana | Maine | Maryland | Massachusetts | Michigan | Minnesota | Mississippi | Missouri | Montana | Nebraska | Nevada | New Hampshire | New Jersey | New Mexico | New York | North Carolina | North Dakota | Ohio | Oklahoma | Oregon | Pennsylvania | Puerto Rico | Rhode Island | South Carolina | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MMWR Year | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2020 | 1209.811321 | 94.792453 | 1475.226415 | 722.811321 | 6141.433962 | 909.962264 | 717.320755 | 207.150943 | 142.075472 | 4649.641509 | 1963.924528 | 230.150943 | 314.754717 | 2460.018868 | 1496.245283 | 680.452830 | 595.037736 | 1060.943396 | 1076.716981 | 299.075472 | 1141.018868 | 1317.773585 | 2205.433962 | 996.716981 | 749.169811 | 1453.471698 | 229.509434 | 376.773585 | 599.188679 | 259.188679 | 1824.773585 | 432.641509 | 2283.603774 | 1714.188679 | 168.905660 | 2749.660377 | 883.943396 | 769.301887 | 2985.735849 | 565.000000 | 232.452830 | 1131.283019 | 193.452830 | 1690.264151 | 4840.943396 | 423.698113 | 117.943396 | 1520.830189 | 1203.773585 | 481.924528 | 1194.245283 | 106.113208 |
| 2021 | 1311.428571 | 64.000000 | 1800.000000 | 747.142857 | 7223.285714 | 791.428571 | 387.428571 | 170.142857 | 117.428571 | 4816.142857 | 1970.142857 | 206.142857 | 271.571429 | 2307.857143 | 1170.142857 | 546.571429 | 602.000000 | 874.285714 | 847.000000 | 305.285714 | 1151.428571 | 1289.285714 | 1809.285714 | 794.857143 | 764.142857 | 1283.571429 | 183.857143 | 312.857143 | 649.428571 | 269.285714 | 1777.000000 | 357.857143 | 2537.142857 | 105.714286 | 93.714286 | 2245.571429 | 815.857143 | 607.428571 | 3116.428571 | 124.428571 | 194.857143 | 865.000000 | 144.285714 | 1720.142857 | 4829.428571 | 393.142857 | 104.857143 | 1578.571429 | 1097.142857 | 119.833333 | 1042.571429 | 92.857143 |
allDeathsPerStateByYear_df.drop([2021],axis=0,inplace=True)
allDeathsPerStateByYear_df
| Jurisdiction of Occurrence | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | District of Columbia | Florida | Georgia | Hawaii | Idaho | Illinois | Indiana | Iowa | Kansas | Kentucky | Louisiana | Maine | Maryland | Massachusetts | Michigan | Minnesota | Mississippi | Missouri | Montana | Nebraska | Nevada | New Hampshire | New Jersey | New Mexico | New York | North Carolina | North Dakota | Ohio | Oklahoma | Oregon | Pennsylvania | Puerto Rico | Rhode Island | South Carolina | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MMWR Year | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2020 | 1209.811321 | 94.792453 | 1475.226415 | 722.811321 | 6141.433962 | 909.962264 | 717.320755 | 207.150943 | 142.075472 | 4649.641509 | 1963.924528 | 230.150943 | 314.754717 | 2460.018868 | 1496.245283 | 680.45283 | 595.037736 | 1060.943396 | 1076.716981 | 299.075472 | 1141.018868 | 1317.773585 | 2205.433962 | 996.716981 | 749.169811 | 1453.471698 | 229.509434 | 376.773585 | 599.188679 | 259.188679 | 1824.773585 | 432.641509 | 2283.603774 | 1714.188679 | 168.90566 | 2749.660377 | 883.943396 | 769.301887 | 2985.735849 | 565.0 | 232.45283 | 1131.283019 | 193.45283 | 1690.264151 | 4840.943396 | 423.698113 | 117.943396 | 1520.830189 | 1203.773585 | 481.924528 | 1194.245283 | 106.113208 |
f,ax = plt.subplots(figsize=(20,30))
allDeathsPerStateByYear_df.plot(kind='barh',ax=ax)
ax.set_ylabel(None)
ax.set_title('MMWR Year\nsince Jan. 2020')
ax.set_xlabel('Number of deaths')
Text(0.5, 0, 'Number of deaths')
As shown above, each state has a differnt number of deaths. The bar chart is a nice way to understand that each state varries. This can we expanded by looking at other factors as well including the states population. And all this information can be compared to any COVID-19 information.
Continuing, I wanted to look at the region where I currently live. Colorado by far has had the most amount of people die in 2020, but not all states are created equal. Population is forsure a factor. This makes me wonder the difference in population amoung Utah and New Mexico. I would guess that they are pretty close based on the following graph.
import altair as alt
import pandas as pd
#Let's Plot States in the Mountain West Time Zone
data = pd.DataFrame({'states': ['Colorado','Utah','New Mexico','Wyoming'],
'deaths': [909.962264,423.698113,432.641509,106.113208]})
alt.Chart(data).mark_bar().encode(
x='states',
y='deaths',
)
Covid-19 has made 2020 a historic year. This graph allows us to evaluate 'out of all deaths how many of those were COVID?'.
#this includes the US as a whole
alt.Chart(df ).mark_point().encode(
y='COVID-19 (U071, Multiple Cause of Death)',
x='All Cause'
)
Now I wanted to evaluate the two states that I live in. I can learn from the graph that Illinois has more weekly deaths than Colorado. In fact, the lowest number of deaths in Illinois is still more than the highest reported week in Colorado.
alt.Chart(allDeathsPerState_df).mark_point().encode(
y='Colorado',
x='Illinois'
)
Overall, I learned more about the number of deaths in America by looking at these graphs.